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N  '  b  ? 


ABSTRACT 


This  paper  considers  the  problem  of  simulating  a  large 
networjfc  N  of  processors  using  a  small  set  of  p  processors. 

The  approach  taken  is  to  partition  the  nodes  of  N  into  p  sub¬ 
sets  ^ . ,Ng  and  to  assign  each  subset  to  a  processor  for 
simulation.  In  order  to  equalize  the  workloads  of  the  proces¬ 
sors,  the  sizes  ofN^,...,Hp  should  be  as  equal  as  possible; 
and  in  order  to  minimize  (and  equalize)  the  amount  of  message 
passing  between  the  processors,  the  number  of  pairs  of  nodes 
that  are  neighbors  in  N  but  belong  to  different  subsets  should 
be  as  small  (and  as  equal)  as  possible.  We  discuss  the  general 
problem  of  partitioning  a  graph  N  so  as  to  satisfy  these  cri¬ 
teria,  and  also  consider  the  particular  case  of  partitioning  a 
tree.  , 
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1.  Introduction 


There  is  a  growing  interest  in  the  study  (and  eventually, 
the  development)  of  large  networks  of  processors  for  application 
to  various  computational  tasks  [1].  For  example,  it  has  long 
been  recognized  [2]  that  image  processing  can  be  done  very 
efficiently  on  a  mesh-connected  network  of  the  same  size  as  the 
image  -  i.e.,  for  an  n  by  n  digital  image  one  should  use  an 
n  by  n  processor  array,  one  processor  per  pixel. 

Cost  considerations  limit  the  size  of  the  networks  that 
can  be  constructed  in  practice.  For  example,  the  largest 
mesh-connected  network  yet  constructed,  NASA's  Massively 
Parallel  Processor  [3],  is  an  array  of  128  by  128  processors; 
but  most  images  are  larger  than  this,  typically  512  by  512 
or  more. 

This  paper  addresses  the  problem  of  simulating  a  large 
network  N  of  processors  when  only  a  small  number  p  of  processors 
are  actually  available.  We  shall  assume,  for  simplicity,  that 
the  available  processors  can  be  connected  in  any  desired  way. 
Intuitively,  it  seems  reasonable  to  do  the  simulation  by  parti¬ 
tioning  the  nodes  of  N  into  p  sets  N^,...,Np,  and  let  each  pro¬ 
cessor  simulate  one  of  the  sets.  In  Section  2  we  shall  define 
criteria  that  such  a  partition  should  satisfy  in  order  to  mini¬ 
mize  the  computation  time  required  for  the  simulation.  Section 
3  considers  algorithms  for  partitioning  an  arbitrary  graph  based 
on  such  criteria,  Section  4  considers  the  special  case  of 
partitioning  a  tree,  and  Section  5  gives  some  further  special 
results  for  the  case  of  a  complete  binary  tree. 


2 .  Graph  partitioning 


The  general  graph  partitioning  problem  can  be  formulated  as 
follows:  We  are  given  a  graph  G  (with  node  set  N)  and  a  set  of 
p  processors,  which  we  can  interconnect  in  any  desired  way.  We 
want  to  partition  N  into  p  subsets  ,  and  assign  each 

set  to  one  of  the  processors.  We  must  also  interconnect  the 
processors  so  that  each  of  them  can  easily  obtain  information 
from  the  neighbors  of  its  nodes  in  G. 

In  defining  the  partition,  we  want  to  make  the  amounts  of 
computation  done  by  the  processors  as  equal  as  possible ,  in  order 
to  minimize  the  total  computation  time  (if  they  were  very  un¬ 
equal,  all  the  processors  would  have  to  wait  for  the  slowest  one) . 
Since  each  processor  must  simulate  its  nodes  one  at  a  time,  this 
suggests  that  the  numbers  of  nodes  assigned  to  the  processors 
should  be  as  equal  as  possible.  (This  ignores  the  fact  that  the 
simulation  may  not  take  the  same  time  for  all  nodes;  for  example, 
it  may  take  longer  for  a  node  of  high  degree  than  for  a  node  of 
low  degree,  since  a  high-degree  node  has  more  inputs.  To  take 
this  into  account,  one  could  require  that  the  sums  of  the  degrees 
of  the  nodes,  rather  than  the  number  of  nodes,  assigned  to  each 
processor  to  be  as  equal  as  possible.  For  simplicity,  however, 
we  shall  use  simple  equalization  of  the  numbers  of  nodes  in  our 
examples . ) 

We  also  want  to  minimize  interprocessor  communication,  and 
to  equalize  the  amounts  of  information  that  a  processor  may  need 
to  receive  from  the  other  processors  in  order  to  do  the  simulation, 


since  a  processor  can  receive  only  one  message  at  a  time. 

Let  us  assume,  for  simplicity,  that  to  simulate  a  node  of  G 
we  only  need  information  from  the  neighbors  (in  G)  of  that 
node.  Then,  we  would  like  to  define  the  partition  so  as  to 
minimize  the  number  of  neighbors  (in  G)  of  the  nodes  in  each 
set  Ni  that  lie  in  other  sets  Nj,  and  to  make  these  numbers 
as  equal  as  possible.  Moreover,  to  avoid  the  need  for  relay¬ 
ing  information  through  several  processors  (which  might  lead 
to  queueing  delays) ,  we  shall  assume  that  processors  i  and  j 
are  directly  connected  if  any  node  in  is  a  neighbor  of  any 
node  in  N . .  Thus  the  resulting  processor  network  G'  is  a  con¬ 
densation  of  the  graph  G;  each  node  i  of  G'  corresponds  to  a  set 
Ni  of  nodes  of  G,  and  nodes  i,j  of  G'  are  joined  by  an  arc  iff 
some  node  in  was  joined  by  an  arc  of  G  to  some  node  in  N  j  . 

Before  discussing  how  to  construct  such  condensations  for 
arbitrary  graphs,  let  us  briefly  consider  the  case  where  G  is 
a  mesh-connected  array.  Here,  to  minimize  the  number  of  neigh¬ 
bors  of  that  lie  in  Nj  for  each  pair  of  node  sets  (i^j), 
we  want  the  N^'s  to  be"compact"  blocks  of  nodes  that  contain 
as  few  border  nodes  (=nodes  with  neighbors  not  in  N^)  as  possi¬ 
ble.  It  is  not  hard  to  see  [4]  that  square  blocks  of  nodes 
are  best  for  this  purpose  -  i.e.,  they  have  the  fewest  border 
nodes  for  a  given  area.  Thus  to  efficiently  simulate  a  large 
mesh-connected  array,  we  should  condense  it  by  dividing  it  into 
square  blocks  of  equal  size;  note  that  such  a  condensation  is 


itself  a  mesh-connected  array.  Incidentally,  it  makes  little 
difference  whether  we  equalize  the  numbers  of  nodes  in  the 
blocks  or  the  sum  of  the  degrees  of  these  nodes,  since  most  of 
the  nodes  in  an  array  (i.e.,  all  but  the  nodes  on  its  border) 
have  the  same  degree. 


3.  Arbitrary  graphs 


The  problem  of  graph  condensation,  so  as  to  minimize  a  given 
cost  function  was  solved  in  a  general  setting  in  [5] .  The 
method  used  was  to  apply  dynamic  programming  to  solve  a  recur¬ 
sion  equation  expressing  the  changes  in  cost  variables  arising 
from  the  addition  of  a  single  node  to  one  of  the  previously  com¬ 
puted  partitions.  The  complexity  of  the  solution  grows  asymp¬ 
totically  as  xx  where  x  is  a  function  of  the  degree  of  the  graph. 
However,  in  the  particular  case  of  partitioning  trees,  a  linear 
time  solution  (linear  in  the  number  of  nodes  in  the  tree)  was  ob¬ 


tained.  It  is  therefore  of  interest  to  investigate  whether  there 
exist  heuristic  methods  which  produce  acceptable  solutions. 

In  this  section,  we  first  consider  a  method  of  graph  parti¬ 
tioning  designed  to  minimize  the  number  of  "outlinks"  between  the 
subsets,  i.e.  the  number  of  node  pairs  (a,b)  such  that  (a,b)  is 
an  arc  of  G  but  a  and  b  belong  to  different  subsets  of  the  parti¬ 
tion.  Our  approach  is  based  on  the  same  principles  as  Huffman 
coding  [6],  but  using  the  degrees  of  the  vertices.  The  idea  is 
to  form  groups  around  vertices  of  large  degrees,  repeatedly 
"merging"  other  vertices  together  until  the  number  of  vertices 
remaining  equals  the  number  of  host  vertices.  The  criterion  used 
to  pick  two  vertices  to  be  merged  is  derived  informally  as  follows: 

Suppose  that  A  is  a  vertex  with  large  degree  n,  and  suppose 
we  merge  A  with  some  partition  (of  which  A  is  a  neighbor) . 

In  so  doing,  we  are  adding  n-1  to  the  number  of  inputs  that  the 
processor  handling  must  deal  with.  Thus,  whenever  we  merge. 


we  should  merge  vertices  of  the  least  possible  degrees.  Also, 
to  preserve  connectedness  inside  components  we  insist  that 
the  vertices  to  be  merged  be  adjacent. 

The  definition  of  a  merge  between  two  vertices  of  a  graph 
G  is  as  follows: 

Function  merge  (G,u,v); 

(*  returns  a  graph  in  which  vertices  u  and  v  of  G  are  merged  *) 
begin 

E'  {(w,u)  |  (w,u)  £  E(G)>  U  {(w,v)  |  (w,v)  £  E  (G)  } 

V’  «-  {w  |  (w , u)  €  E(G)>  U  {w  !  (w,v)  €  E(G)> 

E"  *■  {  t[uv]  ,w)  |  w  £  V'} 

return  (V(G)  -  {u,v}  +  { [uv] } ,  E (G)  -  E'  +  E") 

end 

The  partitioning  algorithm  based  on  merging  pairs  of  least 
degree  is  then  as  follows: 

begin  {p  =  #  processors  available;  G=(V,E)  is  the  input  graph) 
G'  G; 

while  | V (G ' ) |  >p  do 

beg in  find  the  lexicographically  smallest  pair  (deg  (u)  , 
deg(v))  where  uv  £  E(G); 


G'  «-  merge  (G'  ,u,v) 


The  partition  found  in  this  way  is  not  necessarily  unique 
if  ties  are  resolved  arbitrarily  during  the  search  for  the 
smallest  pair  (deg(u),  deg(v)).  The  following  is  an  example 
of  applying  the  partitioning  algorithm  to  a  graph  G  for  p=3 : 


i 


Thus  the  final  partition  is  [2, 4, 7],  [1,3],  [3, 6].  In  this  par¬ 
tition,  the  nodes  of  [2,4,7]  have  4  outlinks;  those  of  [1,5] 
have  5;  and  those  of  [3,6]  have  5.  Thus  the  numbers  of  out¬ 
links  have  been  (approximately)  equalized. 

A  defect  of  this  algorithm  is  that  it  does  not  require  the 
sets  to  contain  approximately  equal  numbers  of  nodes.  For 
example,  if  p=4  and  G  is  a  complete  binary  tree  of  height  3, 
the  following  partition  can  be  obtained: 


To  obtain  a  more  equitable  distribution  of  nodes,  we  must  also 


take  into  account  the  number  of  nodes  in  the  sets  N^.  We  can 
do  this  by  using  a  linear  combination  of  weight  (=number  of 
nodes  in  )  and  degree  (of  the  node  )  as  the  criterion  for 
selecting  nodes  to  be  merged. 

Thus,  for  each  node  v,  we  have 

f (v)  =  a. weight (v)  +  8. degree (v) 
and  we  can  pick  the  constants  a  and  |3  to  "tune"  the  rtition. 

For  the  same  tree  considered  above,  the  followir  parti¬ 
tions,  in  all  of  which  the  penalty  is  2,  can  be  obta  .r  . : 


It  should  be  emphasized  that  the  partitions  obtained  in  this  way 
are  dependent  on  the  order  in  which  nodes  with  equal  f-values 
are  chosen.  Thus,  for  highly  symmetric  structures,  wildly  vary¬ 
ing  partitions  can  be  obtained. 


The  time  complexity  of  the  algorithm  is  determined  as  follows 

Assume  that  the  graph  has  n  nodes.  Thus  there  are  n  f-values 

and  the  least  is  found  in  0(n)  time.  The  loop  is  executed  (n-p) 

times  (assuming  n>p)  giving  an  overall  complexity  of  (n-p)  0(n) 

2 

which  is  0(n  )  for  fixed  p. 

It  has  been  found  empirically  that  having  P>a  usually  results 
in  components  with  very  uneven  distributions  of  nodes  per  compo¬ 
nent.  This  is  because  it  is  possible  for  a  component  with  a 
large  number  of  nodes  to  have  small  degree  (as  in  the  second  example 
above) .  Such  a  node  will  then  become  a  candidate  for  further 
merging,  adding  to  its  already  large  size.  Having  p=0  will 
merge  nodes  based  on  weight  alone,  without  regard  for  the  degrees 
of  the  nodes  being  merged.  This  will  lead  to  large  numbers  of 
outlinks.  Thus  it  is  best  to  have  a>(3  ,  p^O.  It  has  also  been 
found,  empirically,  that  as  long  as  a>(3  ,  increasing  the  value  of 
a  does  not  have  much  effect.  An  intuitive  explanation  for  this 
may  be  given  as  follows:  Suppose  that  {v^  , v2  ,  •  •  •  > vn }  is  the  set 
of  nodes  at  some  stage,  and  that  v^  and  v^  are  selected  for  merg¬ 
ing.  This  means  that  (f  (v^)  ,  f  (v ^  )  )  [or  (f  (v_. )  ,  f  (v^)  )  ]  is  the 

smallest,  lexicographically  -  that  is,  (a. weight (v^ +3 .deg (v^  , 
a .  weight  (v^.  ) +<3  .deg  (Vj  )  )  is  the  smallest.  Since  a  is  "large", 
the  small  f-value  of  v^  (or  v ^ )  must  be  due  to  a  "small"  value 
of  weight  (v^)  (or  weight  (v..  )  )  .  if  we  now  increase  a  to  a+k 
(k>0),  the  set  of  f-values  becomes  { f (v^) +k. weight (v^) ,..., f (vn) + 
k. weight (vn) }. If  vi  and  v^  were  picked  with  the  old  value  of  a, 
because  of  the  small  weight  of  v^  the  increase  caused  by  adding 
k. weight  (v^)  is  usually  not  enough  to  keep  v^  and  v^  from  still 
having  the  smallest  f-values. 


4. 


Trees 


> 


J|  For  trees,  a  solution  to  the  partitioning  problem  is  given 

in  [5]  which  is  linear  in  the  number  of  nodes  in  the  tree.  In¬ 
teger  weights  are  attached  to  both  nodes  and  arcs  of  the  tree, 
3  and  the  solution  finds  a  partition  in  which  the  weight  of  each 

component  is  less  than  or  equal  to  a  parameter,  w,  and  the  sum 
of  the  weights  on  the  arcs  that  form  outlinks  is  minimal. 

[ai  For  example ,  consider  the  tree 

i 

m 


Here,  all  nodes  have  unit  weight,  w=3,  and  the  numbers  along¬ 
side  the  arcs  are  their  weights.  In  this  case,  the  following 
optimal  partition  is  obtained: 


where  we  have  5  components,  each  with  53  nodes.  The  cost  of 
the  outlinks  is  l+2+4+l=8  ,  and  this  is  the  least  possible  value. 
However,  the  distribution  of  nodes  per  component  is  not  even. 

For  example,  if  we  consider  the  tree 


with  unit  weights  on  the  arcs  and  nodes  and  with  w=3 ,  the  solu¬ 
tion  is  optimal  in  terms  of  outlines,  but  has  an  uneven  distri¬ 
bution  of  nodes.  To  achieve  a  more  even  distribution  of  nodes 
per  component,  the  tree  nodes  must  be  weighted  differently  so 
that  clusters  are  broken  up.  In  practice,  it  is  difficult  to 
determine  just  what  this  weighting  should  be. 

If  we  do  not  use  weigh tings,  then  any  partition  of  a  tree 

into  p  connected  components  has  the  same  number  of  outlinks, 

namely  p-1.  To  see  this,  suppose  that  a  tree  G  having  n  nodes 

is  partitioned  into  p  connected  components;  thus  each  component 

is  itself  a  tree.  If  component  has  n^  nodes,  lsisp,  we  must 

have  Z  n.  =  n.  The  total  number  of  arcs  internal  to  the 
lsisp 

components  is  Z  (n.-l)  =  n-p.  Since  the  total  number  of 
l5i<p  1 

arcs  in  the  tree  G  is  n-1,  there  must  be  (n-1) -  (n-p)=p-l  arcs 
not  internal  to  any  component.  These  are  just  the  arcs  connect¬ 
ing  together  the  p  components.  Thus  the  total  number  of  out¬ 
links  is  p-1. 

In  many  cases,  it  is  not  optimal  to  insist  on  connected 
components  in  a  partition  of  a  tree.  For  example,  an  optimal 
partition  of  the  tree  shown  below  into  three  components  results 
in  one  of  the  components  being  disconnected.  It  can  be  veri¬ 
fied  that  in  any  partition  of  this  tree  into  subgraphs  having 
2,2,  and  3  nodes,  at  least  one  of  the  subgraphs  must  be  dis¬ 


connected. 


5 .  Complete  binary  trees 

In  this  section  we  consider  methods  of  partitioning  complete 
binary  trees.  We  recall  that  a  complete  binary  tree  has  2m-l 
nodes  for  some  m:  1  root  node,  2  nodes  at  the  level  below  the 
root,  4  nodes  at  the  level  below  that,  ...  ,2  nodes  k  levels 
below  the  root,  ...  ,  and  2  leaf  nodes. 

We  saw  in  Section  4  that  any  partition  of  a  tree  into  p 
connected  parts  has  p-1  outlinks.  Thus  the  figure  of  merit  of 
such  a  partition  depends  only  on  how  nearly  equal  are  the 
numbers  of  nodes  in  the  parts;  the  best  partition  would  be  one 
in  which  all  the  parts  are  of  the  same  size.  We  now  show  that 
for  a  complete  binary  tree,  there  is  only  one  way  of  partition¬ 
ing  it  into  connected  parts  that  are  all  of  the  same  size. 

Proposition.  Suppose  that  a  complete  binary  tree  is  partitioned 
into  p  connected  parts  all  of  the  same  size.  Then  the  parts  are 
all  complete  binary  trees,  say  of  size  2  -1,  and  their  roots 
are  at  levels  0,h,2h,...  below  the  root  of  the  given  tree.  In 
particular,  we  must  have  p  =  l+2^+2^^+ . . . 2m  h  divides  m, 
and  2m-l=p(2h-l)  . 

Proof :  Let  A  be  a  leaf  of  the  tree,  e.g.,  as  shown  (partially) 


below: 


If  {A}  is  a  part,  the  parts  are  all  singletons  and  the  propo¬ 
sition  is  trivially  true  with  h=l.  If  not,  the  part  NA  con¬ 
taining  A  must  also  contain  E,  since  E  is  the  only  node  adjacent 
to  A.  But  then  NA  must  also  contain  B,  since  otherwise  B  could 
not  belong  to  a  connected  part  having  at  least  two  nodes.  If 
Na  =  {A,B,E} ,  it  is  a  complete  binary  tree  having  three  nodes 
and  the  parts  all  contain  three  nodes.  A  similar  argument  now 
shows  that  the  part  containing  C  must  be  {C,D,F},  and  similarly 
each  pair  of  leaves  having  a  common  father  must  constitute  a 
part.  If  we  remove  these  parts,  what  remains  is  a  complete  bi¬ 
nary  tree  of  height  m-2,  and  we  can  repeat  the  argument  for  that 
tree;  we  conclude  ultimately  that  the  parts  are  all  complete 
binary  trees  having  three  nodes  (i.e.,  h=2)  ,  and  that  m  is  even 
(i.e.,  h  divides  m) . 

Suppose  next  that  has  more  than  three  nodes;  then  it 
must  contain  G.  But  then  {C,D,F}  cannot  belong  to  a  connected 
part  having  more  than  three  nodes;  thus  these  nodes  too  must 
belong  to  NA,  so  that  it  has  at  least  seven  nodes.  If  it  has 
exactly  seven,  it  is  a  complete  binary  tree  having  seven  nodes 
{A,B,C,D,E,F,G} ,  and  all  the  parts  have  seven  nodes.  An  argu¬ 
ment  analogous  to  that  in  the  previous  paragraph  then  shows 
that  all  the  parts  are  complete  binary  trees  having  seven  nodes 
(i.e.,  h=3),  and  that  m  is  a  multiple  of  3. 

We  can  repeat  this  reasoning  to  show  that  if  NA  has  more  than 
seven  nodes,  it  has  at  least  15,  in  which  case  all  the  parts  are 
complete  binary  trees  having  15  nodes  (h=4);  if  it  has  more  than 
15,  it  has  at  least  31;  and  so  on.  Thus  the  Proposition  is  true 
in  any  case.  || 


Note  that  if  we  relax  the  condition  in  the  proposition 
to  allow  just  one  part  to  have  a  different  number  of  nodes 
than  all  the  others,  the  parts  need  no  longer  be  complete 
binary  trees.  For  example,  in 


two  parts  have  three  nodes  each  and  the  third  part  has  one, 
but  one  of  the  parts  is  not  a  complete  binary  tree.  However, 
there  does  always  exist  a  partition  of  a  complete  binary  tree 

.  HI 

of  size  2  -1  into  complete  binary  trees,  all  but  (at  most) 

one  of  which  are  of  size  2h-l,  for  any  desired  hsm.  Indeed, 

let  m=kh+r,  where  Osrsh.  Then  the  parts  consist  of:  a  tree 

of  size  2r-l  rooted  at  the  root;  2r  trees  of  size  2h-l  rooted 

r  "f  h  h 

r  levels  below  the  root;  2r  trees  of  size  2-1  rooted  r+h 

r+2h 

levels  below  the  root;  2  trees  of  size  2-1  rooted  r+2h 

levels  below  the  root;  ...;  2r+ ^  h  trees  of  size  2h-l  rooted 
r+(k-l)h  levels  below  the  root. 

Unfortunately,  partitions  of  a  complete  binary  tree  into 
connected  parts  are  not  always  optimal  even  if  the  parts  are  all 
of  exactly  the  same  size.  For  example,  as  shown  below,  the  com¬ 
plete  binary  tree  with  63  nodes  can  be  partitioned  into  9  compo¬ 
nents  of  7  nodes  each,  where  the  maximum  number  of  outlinks  from 
any  component  is  5  (arising  from  component  no.  9;  the  nodes  are 
labelled  with  their  component  numbers).  On  the  other  hand,  if 
we  partition  this  tree  into  9  complete  binary  trees  of  7  nodes 


like  the  complete  subtrees  in  a  set  to  all  be  of  the  same  size. 
This,  combined  with  the  fact  that  nothing  is  gained  by  placing 
the  equal-sized  subtrees  at  different  levels  (the  number  of  out- 
links  cannot  decrease) ,  and  that  placing  the  complete  subtrees 
"adjacent"  to  each  other  preserves  the  original  tree  structure 
within  a  component,  leads  us  to  consider  forming  each  set 
from  equally  sized  complete  subtrees  at  the  same  level.  We 
shall  refer  to  such  as  "trapezoidal"  blocks. 

To  completely  specify  the  shane  of  a  trapezoidal  block,  we 
must  specify  the  number  of  nodes  in  its  top  edge  and  the  number 
of  its  layers.  It  is  possible  that  differently  shaped  trapezoidal 
blocks  can  have  the  same  number  of  nodes.  In  such  cases,  the 
block  with  fewest  outlinks  should  be  picked. 

Suppose  that  we  have  two  trapezoidal  blocks,  one  with 
nodes  in  its  top  edge  and  layers  deep,  and  the  other 
with  m2  nodes  in  its  top  edge  and  l2  layers  deep.  Suppose 
further  that  both  contain  the  same  number  of  nodes.  Thus 

m^(2  1-1)  =  m2(2  2-l)  .  Also,  assume  that  m^n^;  thus  £2>£^. 

1 1 

The  number  of  outlinks  for  the  first  block  is  m^ (2  i+l)and  for 
the  second  it  is  m2(2  ^+1). 


Since  m^>m2  we  have 


2^1m^+m^  > 

2  1m^ 

-  mx 

+ 

m2  +  m2 

> 

2£2m2 

-  m2 

+ 

2m2 

> 

2£2m2 

+  m2 

Hence  the  first  trapezoidal  block  will  have  more  outlinks  than 


the  second. 


Thus,  we  should  use  trapezoids  that  have  as  many  layers 
as  possible,  and  tops  as  small  as  possible  -  in  other  words, 
they  should  consist  of  as  few  trees  as  possible. 

The  following  algorithm  determines  suitable  parameters  for 
the  trapezoid: 

2m-i 

begin  i  T - 1  ;  a  ■*-  0 ; 

P 

while  i  7*  (2  -1)  for  some  b  do 
begin  a  *■  a+1;  i  *•  r i/2 1 
end; 

Jj 

if  i  =  2-1  for  some  b  then 
begin 

number  of  nodes  in  top  edge  =  2a; 
number  of  layers  =  b 

end 

end 

Clearly,  the  algorithm  stops  for  the  least  possible  value 
of  a  and  the  highest  possible  value  of  b.  The  condition 
i=2^-l  for  some  b  is  easily  detected  from  the  binary  represen¬ 
tation  of  b,  and  is  needed  to  obtain  complete  subtrees  within 
each  trapezoid.  Note  that  if  p  divides  2m-l  evenly,  we  have 
2  ^  1 

i  =  ) p  =  2m-l  and  the  while  Iood  is  not  executed.  We 

p  r  - 

then  have  a=0,  so  that  the  trapezoidal  blocks  are  complete 
binary  subtrees. 

The  trapezoidal  blocks  obtained  by  this  algorithm  are  not 


optimal  in  the  number  of  outlinks.  For  example,  if  p=2, 
the  following  partition  of  a  complete  binary  tree  is  obtained: 


I 


I 


J 


> 


with  a  penalty  of  4, whereas  the  optimal  partition  is 


with  a  penalty  of  1. 

The  following  algorithm*  partitions  the  n  nodes  of  a  com¬ 
plete  binary  tree  T  such  that  the  nodes  are  distributed  to  the 
p  processors  as  evenly  as  possible,  and  it  gives  an  upper  bound 
to  the  cost  (maximum  number  of  outlinks)  of  partitioning  a  com¬ 
plete  binary  tree. 

k 

Step  1.  Find  x.*{0,l}  such  that  t  =  —  =  2  x-  21 
1  IPJ  i=0  1 

where  x^yo.  Clearly  k=tlog2tj. 

Step  2.  To  each  processor,  for  each  i,l£i£k,such  that 

x^=l,  assign  a  complete  subtree  with  21-!  nodes 
such  that  its  leaves  are  also  leaves  of  the  ori¬ 
ginal  tree  T.  There  are  enough  leaves  in  T, 

because  if  x.=l,  then  we  need  2^  ^  leaves  for 

1  k  i-1 

each  processor.  Altogether  p  Z  x^  21  leaves 

k  •  ,i=1  '  k 

are  needed.  But  p  Z  x-  2  =  (p  Z  x.21)/2  s 

i=l  i=l 

pt/2  s  n/2  s  number  of  leaves  in  T. 

*This  algorithm  is  based  in  part  on  suggestions  by  Q.  Stout. 


Step  3.  Let  s  =  number  of  nonzero  x^'s,  Osisk  (ssk+l). 

Among  the  unassigned  nodes  of  T,  assign  s  nodes 
(in  any  convenient  way)  to  each  of  the  p  pro¬ 
cessors.  Now  each  processor  has  t  nodes,  and 
there  are  n-pt  unassigned  nodes. 

Step  4.  Choose  (arbitrarily)  n-pt  processors  to  receive  one 
more  node  each.  In  other  words,  n-pt  processors 
have  t+1  nodes  and  (t+l)p-n  processors  have  t  nodes. 

Each  of  the  complete  subtrees  in  step  2  has  one  outlink 

and  the  number  of  such  complete  subtrees  in  each  processor 

is  Sk.  Each  node  assigned  in  steps  3  and  4  introduces  at  most 

three  outlinks.  Thus,  the  total  number  of  outlinks  in  each 

processor  is  £k+3 (s+1 ) s4k+6=4 i log— — J +6 .  In  general,  this 

P 

cost  can  often  be  smaller  if  in  the  assignment  of  nodes  in 
steps  3  and  4,  we  are  careful  in  assigning  neighboring  nodes 
to  the  same  processor  whenever  possible.  However,  this  method 
of  partitioning  does  not  always  result  in  optimal  cost,  as  the 
following  example  shows;  the  cost  of  this  partition  is  7 ,  aris¬ 
ing  from  component  9 ,  whereas  the  partition  of  this  same  tree 
shown  earlier  in  this  section  had  a  cost  of  only  5. 


The  above  algorithm  gives  an  upper  bound  on  the  cost  of 
partitioning  a  complete  binary  tree.  The  components  obtained 
are  very  "nonuniform"  in  the  sense  that  the  components  can  be 
disconnected;  the  nodes  in  each  component  are  connected  into 
graphs  of  different  shapes;  and  the  nodes  are  all  on  different 
levels  of  the  original  tree.  The  resulting  structure  is  also 
not  a  tree. 


I 


i 


6 .  Concluding  remarks 

In  order  to  simulate  a  large  network  N  of  processors 
using  a  smaller  set  of  processors,  a  natural  approach  is  to 
divide  N  into  parts  and  let  each  processor  simulate  one  of 
the  parts.  In  order  to  equalize  the  workloads  of  the  processors, 
the  number  of  nodes  of  N  in  the  parts  should  be  as  equal  as  possi¬ 
ble,  and  the  numbers  of  arcs  between  parts  should  be  as  small  (and 
as  equal)  as  possible.  "Natural"  partitions  that  are  optimal  with 
respect  to  these  criteria  can  be  found  for  certain  special  N's; 
for  example,  if  N  is  a  mesh-connected  array,  partitioning  it  into 
square  blocks  is  optimal.  For  general  graphs,  finding  an  optimal 
partition  has  exponential  cost.  We  suggest  a  suboptimal  parti¬ 
tioning  algorithm  that  has  quadratic  complexity.  For  trees, 
an  algorithm  of  linear  complexity  can  be  given  that  minimizes  the 
number  of  arcs  between  parts  while  keeping  the  number  of  nodes 
in  each  part  bounded,  but  it  is  not  obvious  how  to  modify  this 
algorithm  to  equalize  the  number  of  nodes  in  the  parts.  In  any 
case,  in  an  optimal  partition  of  a  tree,  the  parts  in  general 
are  not  connected  subgraphs.  We  give  a  suboptimal  algorithm  for 
partitioning  a  complete  binary  tree,  and  also  discuss  a  class  of 
"natural"  partitions  of  such  a  tree  into  complete  binary  trees 
or  into  "trapezoidal"  sets  of  such  trees. 
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